DAY 17 - dbt 中階 (4) - dbt Graph - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

2023 iThome 鐵人賽

DAY 17

AI & Data

dbt: 告別過時的SQL開發流程系列第 17 篇

DAY 17 - dbt 中階 (4) - dbt Graph

15th鐵人賽 dbt dbt core dbt cloud

Stacy Lo

團隊dbt 和 dbt 以外有趣的事

2023-10-02 00:08:49

938 瀏覽

分享至

如果在 SQL 想要列出所有的 tables, views 或 columns，我們可以使用 information_schema。

若是在作業系統中想要 print 出所有的檔名，我們會用 ls 指令，或是 Python, VBA 等任何程式語言。

那在 dbt 要如何列出所有 model 呢？dbt model 和 SQL tables, views 不太一樣，如果用 dbt Cloud 的話，models 也不是本機實體的資料夾/檔案。

答案就是：dbt Graph

如何利用 dbt Graph

Graph 完整的資料

直接 compile {{ graph }} 就是完整的資料（但是難以閱讀，需要整理）

用 for 迴圈整理

可以用 for 迴圈一個一個列出 graph.nodes 的 key

{% for node in graph.nodes%}
{{ node }}
{% endfor %}

也可以加上 .values() 列出 graph.nodes 所有的資訊。

{% for node in graph.nodes.values()%}
{{ node }}
{% endfor %}

改寫成 select 的方式

改寫成 select 的方式，不僅可以跑成 SQL 的查詢結果容易閱讀，更能建成 model 方便後續使用。

{% for node in graph.nodes.values()%}

select "{{ node.name }}" as node_name
{%- if not loop.last %} union all {%- endif -%}

{% endfor %}

Compiled Code



select "customers" as node_name union all

select "stg_customers" as node_name union all

select "seed_order_statuses" as node_name union all

select "unique_customers_customer_id" as node_name union all

select "not_null_customers_customer_id" as node_name union all

select "unique_stg_customers_customer_id" as node_name union all

select "not_null_stg_customers_customer_id" as node_name union all

select "stg_orders" as node_name union all

select "unique_stg_orders_order_id" as node_name union all

select "not_null_stg_orders_order_id" as node_name union all

select "accepted_values_stg_orders_status__placed__shipped__completed__return_pending__returned" as node_name union all

select "not_null_stg_orders_customer_id" as node_name union all

select "relationships_stg_orders_customer_id__customer_id__ref_stg_customers_" as node_name

Preview

加入更多欄位

{% for node in graph.nodes.values()%}

select
"{{ node.resource_type }}" as node_resource_type,
"{{ node.name }}" as node_name,
"{{ node.path }}" as node_path
{% if not loop.last %} union all {%- endif -%}

{% endfor %}

Compiled Code



select
"model" as node_resource_type,
"customers" as node_name,
"customers.sql" as node_path
 union all

select
"model" as node_resource_type,
"stg_customers" as node_name,
"staging/stg_customers.sql" as node_path
 union all

select
"seed" as node_resource_type,
"seed_order_statuses" as node_name,
"seed_order_statuses.csv" as node_path
 union all

select
"test" as node_resource_type,
"unique_customers_customer_id" as node_name,
"unique_customers_customer_id.sql" as node_path
 union all

select
"test" as node_resource_type,
"not_null_customers_customer_id" as node_name,
"not_null_customers_customer_id.sql" as node_path
 union all

select
"test" as node_resource_type,
"unique_stg_customers_customer_id" as node_name,
"unique_stg_customers_customer_id.sql" as node_path
 union all

select
"test" as node_resource_type,
"not_null_stg_customers_customer_id" as node_name,
"not_null_stg_customers_customer_id.sql" as node_path
 union all

select
"model" as node_resource_type,
"stg_orders" as node_name,
"staging/stg_orders.sql" as node_path
 union all

select
"test" as node_resource_type,
"unique_stg_orders_order_id" as node_name,
"unique_stg_orders_order_id.sql" as node_path
 union all

select
"test" as node_resource_type,
"not_null_stg_orders_order_id" as node_name,
"not_null_stg_orders_order_id.sql" as node_path
 union all

select
"test" as node_resource_type,
"accepted_values_stg_orders_status__placed__shipped__completed__return_pending__returned" as node_name,
"accepted_values_stg_orders_4f514bf94b77b7ea437830eec4421c58.sql" as node_path
 union all

select
"test" as node_resource_type,
"not_null_stg_orders_customer_id" as node_name,
"not_null_stg_orders_customer_id.sql" as node_path
 union all

select
"test" as node_resource_type,
"relationships_stg_orders_customer_id__customer_id__ref_stg_customers_" as node_name,
"relationships_stg_orders_96411fe0c89b49c3f4da955dfd358ba0.sql" as node_path

Preview

題外話：清理目標資料庫多餘的 table 和 view（孤兒物件）

每次我們執行 dbt run 或是 dbt build 的指令，dbt 會將我們的 models 打到目標資料庫 (create view 或 create table)。但如果 model 改名或刪除，先前建立過的 view/table 也不會被刪掉。這些 view/table 每次跑 dbt run 或 dbt build 的時候不會被更新到，它們沒有對應的來源 model，可以稱之為孤兒物件。

在開發環境可以把整個 schema 刪掉重跑，就能刪除這些孤兒物件。

正式環境則不能隨便刪除 schema，如果不清理的話，除了佔空間，還會造成許多問題。
舉例原本的 table 名稱是 order_rows 但在某次改版後，更名為 order_lines，如果在下游的 BI 報表，引用到了孤兒 order_rows，就會誤用一個沒有正常更新的資料源，這絕對是需要避免的困擾。

因此若要比對哪些 view/table 為孤兒，就可以用 graph nodes 和 information schema 交叉比對。